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^"*^ , We study a new image sensor that is reminiscent of traditional photographic film. Each pixel in the 

If} i sensor has a binary response, giving only a one-bit quantized measurement of the local light intensity. To 

analyze its performance, we formulate the oversampled binary sensing scheme as a parameter estimation 
problem based on quantized Poisson statistics. We show that, with a single-photon quantization threshold 
and large oversampling factors, the Cramer-Rao lower bound (CRLB) of the estimation variance approaches 
that of an ideal unquantized sensor, that is, as if there were no quantization in the sensor measurements. 
Furthermore, the CRLB is shown to be asymptotically achievable by the maximum likelihood estimator 
| (MLE). By showing that the log-likelihood function of our problem is concave, we guarantee the global 

0^ . optimality of iterative algorithms in finding the MLE. Numerical results on both synthetic data and images 

o 

reconstruction algorithm. They also suggest the potential application of the oversampled binary sensing 
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taken by a prototype sensor verify our theoretical analysis and demonstrate the effectiveness of our image 
reconstruction algorithm. They also suggest 
scheme in high dynamic range photography. 
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I. Introduction 

Before the advent of digital image sensors, photography, for the most part of its history, used film to 
record light information. At the heart of every photographic film are a large number of light-sensitive grains 
of silver-halide crystals [fl]. During exposure, each micron-sized grain has a binary fate: Either it is struck 
by some incident photons and becomes "exposed", or it is missed by the photon bombardment and remains 
"unexposed". In the subsequent film development process, exposed grains, due to their altered chemical 
properties, are converted to silver metal, contributing to opaque spots on the film; unexposed grains are 
washed away in a chemical bath, leaving behind them transparent regions on the film. Thus, in essence, 
photographic film is a binary imaging medium, using local densities of opaque silver grains to encode the 
original light intensity information. Thanks to the small size and large number of these grains, one hardly 
notices this quantized nature of film when viewing it at a distance, observing only a continuous gray tone. 

In this work, we study a new digital image sensor that is reminiscent of photographic film. Each pixel in 
the sensor has a binary response, giving only a one-bit quantized measurement of the local light intensity. 
At the start of the exposure period, all pixels are set to 0. A pixel is then set to 1 if the number of 
photons reaching it during the exposure is at least equal to a given threshold q. One way to build such 
binary sensors is to modify standard memory chip technology, where each memory bit cell is designed to 
be sensitive to visible light |2). With current CMOS technology, the level of integration of such systems 
can exceed 10 9 ~ 10 10 (i.e., 1 giga to 10 giga) pixels per chip. In this case, the corresponding pixel sizes 
(around 50 nm 0) are far below the diffraction limit of light (see Section [TT] for more details), and thus the 
image sensor is over sampling the optical resolution of the light field. Intuitively, one can exploit this spatial 
redundancy to compensate for the information loss due to one -bit quantizations, as is classic in oversampled 
analog-to-digital (A/D) conversions EJ-Q. 

Building a binary sensor that emulates the photographic film process was first envisioned by Fossum JS], 
who coined the name "digital film sensor". The original motivation was mainly out of technical necessity. 
The miniaturization of camera systems calls for the continuous shrinking of pixel sizes. At a certain point, 
however, the limited full-well capacity (i.e., the maximum photon-electrons a pixel can hold) of small pixels 
becomes a bottleneck, yielding very low signal-to-noise ratios (SNRs) and poor dynamic ranges. In contrast, 
a binary sensor whose pixels only need to detect a few photon-electrons around a small threshold q has 
much less requirement for full-well capacities, allowing pixel sizes to shrink further. 

In this paper, we present a theoretical analysis of the performance of the binary image sensor, and 
propose an efficient and optimal algorithm to reconstruct images from the binary sensor measurements. 
Our analysis and numerical simulations demonstrate that the dynamic ranges of the binary sensors can be 
orders of magnitude higher than those of conventional image sensors, thus providing one more motivation 
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for considering this binary sensing scheme. 

Since photon arrivals at each pixel can be well-approximated by a Poisson random process whose rate is 
determined by the local light intensity, we formulate the binary sensing and subsequent image reconstruction 
as a parameter estimation problem based on quantized Poisson statistics. Image estimation from Poisson 
statistics has been extensively studied in the past, with applications in biomedical and astrophysical imaging. 
Previous work in the literature has used linear models ||9], multiscale models ifTOl . iTTTTl . and nonlinear 
piecewise smooth models lfl2l . |fl3l to describe the underlying images, leading to different (penalized) 
maximum likelihood and/or Bayesian reconstruction algorithms. The main difference between our work and 
previous works is that we have only access to one-bit quantized Poisson statistics. The binary quantization 
and spatial oversampling in the sensing scheme add interesting dimensions to the original problem. As we 
will show in Section [TTll the performance of the binary sensor depends on the intricate interplay of three 
parameters: the average light intensity, the quantization threshold q, and the oversampling factor. 

The binary sensing scheme studied in this paper also bears resemblance to oversampled A/D conversion 
schemes with quantizations (see, e.g., m-Q). Previous work on one -bit A/D conversions considers ban- 
dlimited signals or, in general, signals living in the range space of some overcomplete representations. The 
effect of quantization is often approximated by additive noise, which is then mitigated through noise shaping 
El, El, or dithering Q, followed by linear reconstruction. In our work, the binary sensor measurements 
are modeled as one-bit quantized versions of correlated Poisson random variables (instead of deterministic 
signals), and we directly solve the statistical inverse problem by using maximum likelihood estimation, 
without any additive noise approximation. 

The rest of the paper is organized as follows. After a precise description of the binary sensing model in 
Section JH we present three main contributions in this paper: 

1. Estimation performance: In Section [TTlJ we analyze the performance of the proposed binary sensor 
in estimating a piecewise constant light intensity function. In what might be viewed as a surprising result, 
we show that, when the quantization threshold q = 1 and with large oversampling factors, the Cramer- 
Rao lower bound (CRLB) |[T4ll of the estimation variance approaches that of unquantized Poisson intensity 
estimation, that is, as if there were no quantization in the sensor measurements. Furthermore, the CRLB 
can be asymptotically achieved by a maximum likelihood estimator (MLE), for large oversampling factors. 
Combined, these two results establish the feasibility of trading spatial resolutions for higher quantization 
bit depth. 

2. Advantage over traditional sensors: We compare the oversampled binary sensing scheme with tradi- 
tional image sensors in Section IIII-CI Our analysis shows that, with sufficiently large oversampling factors, 
the new binary sensor can have higher dynamic ranges, making it particularly attractive in acquiring scenes 
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containing both bright and dark regions. 

3. Image reconstruction: Section JV] presents an MLE-based algorithm to reconstruct the light intensity 
field from the binary sensor measurements. As an important result in this work, we show that the log- 
likelihood function in our problem is always concave for arbitrary linear field models, thus ensuring the 
achievement of global optimal solutions by iterative algorithms. For numerically solving the MLE, we 
present a gradient method, and derive efficient implementations based on fast signal processing algorithms 
in the polyphase domain lfl~5ll . ifToll . This attention to computational efficiency is important in practice, due 
to extremely large spatial resolutions of the binary sensors. 

Section [V] presents numerical results on both synthetic data and images taken by a prototype device ifTTl . 
These results verify our theoretical analysis on the binary sensing scheme, demonstrate the effectiveness of 
our image reconstruction algorithm, and showcase the benefit of using the new binary sensor in acquiring 
scenes with high dynamic ranges. 

To simplify the presentation we base our discussions on a one -dimensional (1-D) sensor array, but all 
the results can be easily extended to the 2-D case. 

II. Imaging by Oversampled Binary Sensors 
A. Diffraction Limit and Linear Light Field Models 

In this section, we describe the binary imaging scheme studied in this work. Consider a simplified camera 



model shown in Fig. |l(a)| We denote by \q(x) the incoming light intensity field (i.e., the radiance map). 
By assuming that light intensities remain constant within a short exposure period, we model the field as 
only a function of the spatial variable x. Without loss of generality, we assume that the dimension of the 
sensor array is of one spatial unit, i.e., < x < 1. 

After passing through the optical system, the original light field Xq(x) gets filtered by the lens, which 
acts like a linear system with a given impulse response. Due to imperfections (e.g., aberrations) in the lens, 
the impulse response, a.k.a. the point spread function (PSF) of the optical system, cannot be a Dirac delta, 
thus, imposing a limit on the resolution of the observable light field. However, a more fundamental physical 
limit is due to light diffraction (THJ. As a result, even if the lens is ideal, the PSF is still unavoidably a 
small blurry spot [see, for example, Fig. |l(b)] . In optics, such diffraction-limited spot is often called the 
Airy disk |[T8l , whose radius R a can be computed as 

Ra = 1.22 wf, 

where w is the wavelength of the light and / is the F-number of the optical system. 

Example 1: At wavelength w = 420 nm (i.e., for blue visible light) and / = 2.8, the radius of the Airy 
disk is 1.43 /Lim. Two objects with distance smaller than R a cannot be clearly separated by the imaging 
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(b) 



Fig. 1. The imaging model, (a) The simplified architecture of a diffraction-limited imaging system. Incident light field Ao(a;) 
passes through an optical lens, which acts like a linear system with a diffraction-limited point spread function (PSF). The result is 
a smoothed light field \(x), which is subsequently captured by the image sensor, (b) The PSF (Airy disk) of an ideal lens with a 
circular aperture. 



system as their Airy disks on the image sensor start blurring together. Current CMOS technology can already 
make standard pixels smaller than R a , reaching sizes ranging from 0.5 //m to 0.7 fim |[T9l . In the case 
of binary sensors, the simplicity of each pixel allows the feature size to be further reduced. For example, 
based on standard memory technology, each memory bit-cell {i.e., pixel) can have sizes around 50 nm O, 
making it possible to substantially oversample the light field. 

In what follows, we denote by \(x) the diffraction-limited {i.e., "observable") light intensity field, which 
is the outcome of passing the original light field \q(x) through the lens. Due to the lowpass (smoothing) 
nature of the PSF, the resulting \{x) has a finite spatial-resolution, i.e., it has a finite number of degrees 
of freedom per unit space. 

Definition 1 (Linear field model): In this work, we model the diffraction-limited light intensity field as 

\(x) = — ) c n ip(Nx-n), (1) 

n=0 

where ip(x) is a nonnegative interpolation kernel, N is a given integer, r is the exposure time, and 
{c n : c n > 0} is a set of free variables. 

Remark 1: The constant N/t in front of the summation is not essential, but its inclusion here leads to 
simpler expressions in our later analysis. 

The function \(x) as defined in £1} has N degrees of freedom. To guarantee that the resulting light fields 
are physically meaningful, we require both the interpolation kernel ip{x) and the expansion coefficients 
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{c n } to be nonnegative. Some examples of the interpolation kernels <f(x) include the box function, 

1, ifO<x<l; 



(2) 

0, otherwise, 



cardinal B -splines EOl . 

/ \ / k\ 

(fe+l) times 



and the squared sine function, sin 2 (ir{x — |)) / (tv(x — |)) 2 - 



B. Sampling the Light Intensity Field 

The image sensor in Fig. |l(a)| works as a sampling device of the light intensity field \{x). Suppose that 
the sensor consists of M pixels per unit space, and that the mth pixel covers the area between ^jjp], 
for < m < M. We denote by s m the total light exposure accumulated on the surface area of the mth 

pixel within an exposure time period [0,r]. Then, 

r f (m+l)/M 

del 



Sm = / A(x) dx dt 

JO Jm./M 

= T(\(x),p(Mx-m)), (4) 

where (3{x) is the box function defined in © and (•, •) represents the standard L 2 -inner product. Substitute 
the light field model CO) into the above equality, 

N x - 

s m = r(- ) c n (p(Nx - n),(3(Mx - m)) 

T — ' 

n 

= c n {Nip(Nx - ri),P(Mx - m)) 

n 

(M<i+I!> -,„)), <5) 

n ^ ' 

where (f5]l is obtained through a change of variables (Nx — n) — > x. 

Definition 2: The spatial over sampling factor, denoted by K, is the ratio between the number of pixels 
per unit space and the number of degrees of freedom needed to specify the light field X(x) in (Q]), i.e., 

In this work, we are interested in the "oversampled" case where K > 1. Furthermore, we assume that 
K is an integer for simplicity of notation. Using ©, and by introducing a discrete filter 

9m = {<p(x),(3(Kx - m)), m <E Z, (7) 



YANG et ah: BITS FROM PHOTONS: OVERSAMPLED IMAGE ACQUISITION USING BINARY POISSON STATISTICS 



7 




Binary sensing 



Photon Counting 



Quantization 



Fig. 2. The signal processing block diagram of the imaging model studied in this paper. In the first step, the light exposure value 
s m at the mth pixel is related to the expansion coefficients c n through a concatenation of upsampling and filtering operations. 
Subsequently, the image sensor converts {s m } into quantized measurements {b m } (see Fig.[3]and the discussions in Section Hl-CI 
for details of this second step). 



we can simplify (f5]) as 



Cn 9m— Kn- 



(8) 



The above equality specifies a simple linear mapping from the expansion coefficients {c n } of the light 
field to the light exposure values {s m } accumulated by the image sensor. Readers familiar with multirate 
signal processing |[T5l . |[T6l will immediately recognize that the relation in © can be implemented via a 
concatenation of upsampling and filtering, as shown in the left part of Fig. [2] This observation can also be 
verified by expressing © in the z-transform domain 



S(z) = Y,CnZ- Kn G(z) = C(z K )G(z), 



(9) 



and using the fact that C(z ) is the z-transform of the i^-fold upsampled version of c n . In Section UV] we 
will further study the signal processing block diagram in Fig. |2] to derive efficient implementations of the 
proposed image reconstruction algorithm. 

Example 2: The discrete filter g m is completely specified by the interpolation kernel (p(x) and the 
oversampling factor K. As a simple case, when the kernel ip(x) = f3(x), we can compute from d7J that 



1/K, for < m < K; 
0, otherwise. 



(10) 



C. Binary Sensing and One-Bit Poisson Statistics 

Fig. [3] illustrates the binary sensor model. Recall from (|4) that {s m } denote the exposure values accumu- 
lated by the sensor pixels. Depending on the local values of {s rn }, each pixel (depicted as "buckets" in the 
figure) collects a different number of photons hitting on its surface. In what follows, we denote by y m the 
number of photons impinging on the surface of the mth pixel during an exposure period [0, r]. The relation 
between s rn and the photon count y rn is stochastic. More specifically, y m can be modeled as realizations 
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Fig. 3. The model of the binary image sensor. The pixels (shown as "buckets") collect photons, the numbers of which are compared 
against a quantization threshold q. In the figure, we illustrate the case when q = 2. The pixel outputs are binary: b m = 1 (i.e., 
white pixels) if there are at least two photons received by the pixel; otherwise, b m — (i.e., gray pixels). 



of a Poisson random variable Y m , whose intensity parameter is equal to s m , i.e., 

= y m ; s m ) = € . Sm , for y m G Z+ U {0} . 



(11) 



It is a well-known property of the Poisson process that E[y m ] = s m . Thus, the average number of photons 
captured by a given pixel is equal to the local light exposure s m . 

As a photosensitive device, each pixel in the image sensor converts photons to electrical signals, whose 
amplitude is proportional to the number of photons impinging on that pixelQ In a conventional sensor 
design, the analog electrical signals are then quantized by an A/D converter into 8 to 14 bits (usually the 
more bits the better). In this work, we study a new sensor design using the following binary (i.e., one-bit) 
quantization scheme. 

Definition 3 (Binary Quantization): Let q > 1 be an integer threshold. A binary quantizer is a mapping 
Q : Z+ U {0} i — ► {0, 1}, such that 

i, if y > q; 

0, otherwise. 



Q(y) 



In Fig. [3l we illustrate the binary quantization scheme. White pixels in the figure show Q(y m ) = 1 and 



def 

gray pixels show Q(y m ) = 0. We denote by b m = Q(y m ), b rn € {0, 1}, the quantized output of the mth 
pixel. Since the photon counts {y m } are drawn from random variables {y m }, so are the binary sensor output 



'The exact ratio between these two quantities is determined by the quantum efficiency of the sensor. 
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{b m }, from the random variables |-B m = Q(Y m )\. Introducing two functions 

= ^) d = f i-E^- s (12) 

fc=0 fc=0 

we can write 

P(#m = b m \S m ) =Pb m (Sm), b m € {0, 1} . (13) 

Remark 2: The noise model considered in this paper is that of Poisson noise. In practice, the performance 
of image sensors is also influenced by thermal noise, which in our case can be modeled as random bit- 
flipping in the binary sensor measurements. Due to space constraints, we leave further discussions on this 
additional noise source and its impact on reconstruction performance to a follow-up work. 



D. Multiple Exposures and Temporal Oversampling 

Our previous discussions focus on the case of acquiring a single frame of quantized measurements during 
the exposure time [0,r]. As an extension, we can consider multiple exposures and acquire J consecutive 
and independent frames. The exposure time for each frame is set to r/J, so that the total acquisition time 
remains the same as the single exposure case. In what follows, we call J the temporal oversampling factor. 

As before, we assume that t<1 and thus light intensities \{x) stay constant within the entire acquisition 
time [0, t]. For the jth frame (0 < j < J), we denote by Sj >m the light exposure at the mth pixel. Following 
the same derivations as in Section III-BL we can show that 

1 x 

s h m = J Cn 9m-Kn, for all j, (14) 

n 

where {c n } are the expansion coefficients of the light field X(x), and g m is the discrete filter defined in 
fZ]). The only difference between (fT4l and © is the extra factor of 1/J, due to the change of exposure 
time from r to r/J. In the z-domain, similar to ©, 

Sj{z) = jC(z K )G(z). (15) 

In what follows, we establish an equivalence between temporal oversampling and spatial oversampling. 
More precisely, we will show that an M-pixel sensor taking J independent exposures (i.e., with J-times 
oversampling in time) is mathematically equivalent to a single sensor consisting of MJ pixels. 

First, we introduce a new sequence s m ,0 < m < MJ, constructed by interlacing the J exposure 
sequences {sj, m }- For example, when J = 2, the new sequence is 



so,o 


j s l,0, 


so,i 


, • • • , 




) S\,mi • • • i 


Sq,m-\ 



, Sl,M-l, 
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where {so, m } and {si iTn } alternate. In general, s m can be obtained as 

~ def 

s m = Sj,n, m = Jn + j, < j < J, < n < M. (16) 

In multirate signal processing, the above construction is called the polyphase representation |[T5l . |[T6l . and 
its alternating subsequences {sj, m }jZo the polyphase components. 
Proposition 1: Let g m be a filter whose z-transform 

G(z) = G(z J )(l + z~ 1 + ... + z-V-Qyj, (17) 

where G(z) is the z-transform of the filter g m defined in (O. Then, 

Sm = ^2 ° n 9m-KJn- (18) 
ii 

Proof: See Appendix lAl ■ 
Remark 3: Proposition[I]formally establishes the equivalence between spatial and temporal oversampling. 
We note that (TT8T ) has exactly the same form as ©, and thus the mapping from {c n } to {s m } can be 
implemented by the same signal processing operations shown in Fig. [2] — we only need to change the 
upsampling factor from K to KJ and the filter from g m to g m . In essence, by taking J consecutive 
exposures with an M-pixel sensor, we get the same light exposure values {s m }, as if we had used a more 
densely packed sensor with MJ pixels. 

Remark 4: Taking multiple exposures is a very effective way to increase the total oversampling factor of 
the binary sensing scheme. The key assumption in our analysis is that, during the J consecutive exposures, 
the light field remains constant over time. To make sure this assumption holds for arbitrary values of J, we 
set the exposure time for each frame to t/J, for a fixed and small r. Consequently, the maximum temporal 
oversampling factor we can achieve in practice will be limited by the readout speed of the binary sensor. 

Thanks to the equivalence between spatial and temporal oversampling, we only need to focus on the 
single exposure case in our following discussions on the performance of the binary sensor and image 
reconstruction algorithms. All the results we obtain extend directly to the multiple exposure case. 

III. Performance Analysis 

In this section, we study the performance of the binary image sensor in estimating light intensity 
information, analyze the influence of the quantization threshold and oversampling factors, and demonstrate 
the new sensor's advantage over traditional sensors in terms of higher dynamic ranges. In our analysis, 
we assume that the light field is piecewise constant, i.e., the interpolation kernel ip(x) in (Q]) is the box 
function (3(x). This simplifying assumption allows us to derive closed-form expressions for several important 
performance measures of interest. Numerical results in Section [V] suggest that the results and conclusions 
we obtain in this section applies to the general linear field model in (Q]) with different interpolation kernels. 
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A. The Cramer-Rao Lower Bound (CRLB) of Estimation Variances 

From Definition [T] reconstructing the light intensity field A(x) boils down to estimating the unknown 
deterministic parameters {c n }. Input to our estimation problem is a sequence of binary sensor measurements 
{b m }, which are realizations of Bernoulli random variables {B m }. The probability distributions of {B m } 
depend on the light exposure values {s m }, as in (fT3T ). Finally, the exposure values {s m } are linked to the 
light intensity parameters {c n } in the form of ([8]). 

Assume that the light field X(x) is piecewise constant. We have computed in Example [2] that, under this 
case, the discrete filter g m used in © is a constant supported within [0,K — 1], as in ( TTOb - The mapping 
([8]) between {c n } and {s m } can now be simplified as 

s m = c n /K, for nK < m < (n + l)K. (19) 

We see that the parameters {c n } have disjoint regions of influence, meaning, Co can only be sensed by a 
group of pixels {sq, . . . , sk-i}, c\ by {sk, • • • , S2K-1}, and so on. Consequently, the parameters {c n } can 
be estimated one-by-one, independently of each other. 

In what follows, and without loss of generality, we focus on estimating cq from the block of binary 
measurements b = [bo, . . . ,bK-i] T - For notational simplicity, we will drop the subscript in cq and use c 
instead. To analyze the performance of the binary sensing scheme, we first compute the CRLB lfl4l . which 
provides a theoretical lower bound on the variance of any unbiased estimator. 

Denote by Cb(c) the likelihood function of observing K binary sensor measurement b. Then, 

C b {c) = F{B m = b m , < m < K; c), 

K-l 

= H ¥(B m = b m ;c), (20) 

m=0 
K-l 

= H p b Jc/K), (21) 

m=0 

where (1201 is due to the independence of the photon counting processes at different pixel locations, and d2Tb 
follows from (TT3T > and (fT9l ). Defining K\ (0 < K\ < K) to be the number of "l"s in the binary sequence 
b, we can simplify (l2Tb as 

C b (c) = ( Pl (c/K)) Kl (p (c/K)) K - Kl . (22) 

Proposition 2: The CRLB of estimating the light intensity c from K binary sensor measurements with 
threshold q > 1 is 
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Proof: See Appendix iBl ■ 
It will be interesting to compare the performance of our binary image sensor with that of an ideal sensor 
which does not use quantization at all. To that end, consider the same situation as before, where we use K 
pixels to observe a constant light intensity value c. The light exposure s m at each pixel is equal to c/K, 
as in ( fT9l ). Now, unlike the binary sensor which only takes one-bit measurements, consider an ideal sensor 
that can perfectly record the number of photon arrivals at each pixel. By referring to Fig. [3] the sensor 
measurements in this case will be {y m }, whose probability distributions are given in (fill . 
In Appendix O we compute the CRLB of this unquantized sensing scheme as 

CRLB ldeal (K) = c, (24) 

which is natural and reflects the fact that the variance of a Poisson random variable is equal to its mean 
(i.e., c, in our case). 

To be sure, we always have CRLB^, m (K , q) > CRLBjd e ai(-fO, f° r arbitrary oversampling factor K and 
quantization threshold q. This is not surprising, as we lose information by one-bit quantizations. In practice, 
the ratio between the two CRLBs provides a measure of performance degradations incurred by the binary 
sensors. What is surprising is that the two quantities can be made arbitrarily close, when q = 1 and K is 
large, as shown by the following proposition. 

Proposition 3: For q = 1, 

CRLB bin (K, q) = c + — + O , (25) 

which converges to CRLBid ea i(-?0 as the oversampling factor K goes to infinity. For q > 2, 

CRLB bin (K, </)/CRLB ideal (K) > 1.31, (26) 
and lim^-^ooCRLBbh^K, g)/CRLBi d eai(i^) = oo. 



Proof: Specializing the expression (1231 for q = 1, we get 

CRLB bi JK, l)=c(l + — + + + . . . 
V ' ' \ 2K 31K 2 4\K 3 

and thus ( |25l) . The statements for cases when q > 2 are shown in Appendix |D] ■ 

Proposition [3] indicates that it is feasible to use oversampling to compensate for information loss due to 

binary quantizations. It follows from (|25T ) that, with large oversampling factors, the binary sensor operates 

as if there were no quantization in its measurements. It is also important to note that this desirable tradeoff 

between spatial resolution and estimation variance only works for a single -photon threshold (i.e., q = 1). For 

other choices of the quantization threshold, the "gap" between CRLBbi n (-ft', q) and CRLB;d e ai(^), measured 

in terms of their ratio, cannot be made arbitrarily small, as shown in (l26l ). In fact, it quickly tends to infinity 

as the oversampling factor K increases. 
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The results of Proposition [3] can be intuitively understood as follows: The expected number of photons 
collected by each pixel during light exposure is equal to s m = c/K. As the oversampling factor K goes to 
infinity, the mean value of the Poisson distribution tends to zero. Consequently, most pixels on the sensor 
will only get zero or one photon, with the probability of receiving two or more photons at a pixel close to 
zero. In this case, with high probability, a binary quantization scheme with threshold q = 1 does not lose 
information. In contrast, if q > 2, the binary sensor measurements will be almost uniformly zero, making 
it nearly impossible to differentiate between different light intensities. 

B. Asymptotic Achievability of the CRLB 

In what follows, we show that, when q = 1, the CRLB derived in (|23l ) can be asymptotically achieved by 
a simple maximum likelihood estimator (MLE). Given a sequence of K binary measurements b, the MLE 
we seek is the parameter that maximizes the likelihood function £b(c) in d22l) . More specifically, 

^ def 

cml(&) = argmax £ b (c) 

0<c<S 

= argmax (l - p (c/K)) Kl ( Po (c/K)) K " Kl , (27) 

0<c<S 

where we substitute p\{c/K) in (1221) by its equivalent form 1 — po(c/K). The lower bound of the search 
domain c > is chosen according to physical constraints, i.e., the light field can not take negative values. 
The upper bound c < S becomes necessary when K\ = K, in which case the likelihood function Ct,(c) = 
Pi{c/K) K is monotonically increasing with respect to the light intensity level c. 
Lemma 1: The MLE solution to d27l) is 

^ \Kp { ^\l- Kx/K), ifO<Kx<K(l- Po (S/K)), 

cml(&) = < (28) 
I S, otherwise, 

where p^ 1 \x) is the inverse function of po{x). 

Remark 5: From the definition in (fT2l . we can easily verify that ^rPo(^) < for all x > 0. It follows 
that the function po(x) is strictly decreasing for x > and that the inverse p^ ^(x) is well-defined. For 
example, when q = 1, we have po(x) = e~ x and thus p L ~ 1 \x) = — log(x). In this particular case, and for 
K\ <C K, we have cml(^) = —if log (1 — K\jK) « K\. It follows that we can use the sum of the K 
binary measurements as a first-order approximation of the light intensity estimation. 

Proof: At the two extreme cases, when K\ = or K\ = K, it is easy to see that d28l is indeed the 
solution to (|27T ). Next, we assume that < K\ < K. 

Computing the derivative of £b(c) and setting it to zero, we can verify that the equation ^>Cb(c) = 
has a single solution at 

c max = Kp [ ~ 1] (l- Kx/K). 
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Since Cb(c) > and £5(0) = lim^oo £5(0) = 0, we conclude that the likelihood function Cb{c) achieves 
its maximum value at c max - Finally, the MLE solution cml = min{c max , S}, and thus, we have (|28T ). ■ 
Theorem 1: When q = 1, we have 

E[qul(&)] = c + £1 + 0(1/1?), forc<5-2, (29) 
/ \S-i 

where |ei| < 2c e c ( ] . Meanwhile, the mean squared error (MSE) of the estimator approaches 
CRLBjdeai, i.e., 

E[(d ML (b)-c) 2 ] = c + e 2 + 0(l/K), for c<S -2, (30) 
5-2 



where \e 2 \ < 2c(c+ l)e~ c 

Remark 6: It is easy to verify that, for fixed c, the two terms £1 and £2 converge (very quickly) to as S 
tends to infinity. It then follows from d29l and (|30l that the MLE is asymptotically unbiased and efficient, 
in the sense that 

lim lim E[cml(&)] = c and lim lim E [(cml(6) — c) 2 l = c. 

We leave the formal proof of this theorem to Appendix |E] Its main idea can be summarized as follows. As 
K goes to infinity, the area of each pixel tends to zero, so does the average number of photons arriving 
at that pixel. As a result, most pixels on the sensor will get only zero or one photon during exposure. A 
single-photon binary quantization scheme can record perfectly the pattens of "0"s and "l"s on the sensor. 
It loses information only when a pixel receives two or more photons, but the probability of such events 
tends to zero as K increases. 

Suppose, now, that we use a quantization threshold q > 2. In this case, as K tends to infinity, the binary 
responses of different pixels will almost always be "0", essentially obfuscating the actual light intensity 
values. This problem leads to poor performance in the MLE. As stated in the following proposition, the 
asymptotic MSE for q > 2 becomes c 2 instead of c. 

Proposition 4: When q > 2, the MLE is asymptotically biased, that is, for any fixed c and 5, 

lim E[c ML (b)]=0. (31) 



Meanwhile, the MSE becomes 



Proof: See Appendix [0 



lim E [(c ML (b) - cf] =c 2 . (32) 
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Fig. 4. Performance comparisons of three different sensing schemes ("BIN", "IDEAL", and "SAT") over a wide range of light 
exposure values c (shown in logarithmic scale). The dash-dot line (in red) represents the "IDEAL" scheme with no quantization; 
The solid line (in blue) corresponds to the "SAT" scheme with a saturation point set at C ma x = 9130 12 11 ; The four dashed lines 
(in black) correspond to the "BIN" scheme with q — 1 and different oversampling factors (from left to right, K = 2 13 , 2 14 , 2 15 
and 2 16 , respectively). 

C. Advantages over Traditional Sensors 

In what follows, we demonstrate the advantage of the oversampled binary sensing scheme, denoted by 
"BIN", in achieving higher dynamic ranges. We focus on the case where the quantization threshold is set to 
q = 1. For comparisons, we also consider the following two alternative sensing schemes: The first, denoted 
by "IDEAL", uses a single pixel to estimate the light exposure parameter {i.e., nonoversampled), but that 
pixel can record perfectly the number of photon arrivals during exposure; The second scheme, denoted by 
"SAT", is very similar to the first, with the addition of a saturation point C max , beyond which the pixel 
can hold no more photons. Note that in our discussions, the "SAT" scheme serves as an idealized model 
of conventional image sensors, for which the saturation is caused by the limited full-well capacity of the 
semiconductor device. The general trend of conventional image sensor design has been to pack more pixels 
per chip by reducing pixel sizes, leading to lower full-well capacities and thus lower saturation values. 

Fig. [4] compares performances of the three different sensing schemes {i.e., "BIN", "IDEAL", and "SAT") 
over a wide range of light exposure values. We measure the performances in terms of signal-to-noise ratios 
(SNRs), defined as 

SNR = 101og -E[r^]' 

where c is the estimation of the light exposure value we obtain from each of the sensing schemes. 

We observe that the "IDEAL" scheme (the red dash-dot line in the figure) represents an upper-bound of 
the estimation performance. To see this, denote by y the number of photons that arrive at the pixel during 
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exposure. Then y is a realization of a Poisson random variable Y with intensity equal to the light exposure 
value c, i.e., 

F(Y = y;c)-- 



Maximizing this function over c, we can compute the MLE for the "IDEAL" scheme as ctdeal(2/) = V- 
It is easy to verify that this estimator is unbiased, i.e., E[c"ideal(^)] = E[Y] = c, and that it achieves the 
ideal CRLB in (l24l ). i.e., var(c IDEAL (^)) = var(Y) = c. Accordingly, we can compute the SNR as 

SNR IDEAL = 101og 10 (c 2 /c) = 101og 10 (c), 

which appears as a straight line in our figure with the light exposure values c shown in logarithmic scale. 

The solid line in the figure corresponds to the "SAT" scheme, with a saturation point set at C max = 9130, 
which is the full well capacity of the image sensor reported in |[2Ti . The sensor measurement in this case 
is 2/sat = min{y, C max }, and the estimator we use is 

csat(2/sat) = 2/sat- (33) 

We can see that the "SAT" scheme initially has the same performance as "IDEAL". It remains this way 
until the light exposure value c approaches the saturation point C max , after which there is a drastic dropc 
in SNR. Denoting by SNR m ; n the minimum acceptable SNR in a given application, we can then define the 
dynamic range of a sensor as the range of c for which the sensor achieves at least SNR min . For example, 
if we choose SNR m ; n = 20 dB, then, as shown in the figure, the SAT scheme has a dynamic range from 
c = 10 2 to c pa 10 4 , or, if measured in terms of ratios, 100 : 1. 

Finally, the three dashed lines represent the "BIN" scheme with q = 1 and increasing oversampling 
factors (from left to right: K = 2 13 , 2 14 , 2 15 and 2 16 , respectively). We use the MLE given in (|28]> and 
plot the corresponding estimation SNRs. We see that, within a large range of c, the performance of the 
"BIN" scheme is very close to that of the "IDEAL" scheme that does not use quantization. This verifies our 
analysis in Theorem \T\ which states that the "BIN" scheme with a single -photon threshold can approach 
the ideal unquantized CRLB when the oversampling factor is large enough. Furthermore, when compared 
with the "SAT" scheme, the "BIN" scheme has a more gradual decrease in SNR when the light exposure 
values increase, and has a higher dynamic range. For example, when K = 2 16 , the dynamic range of the 
"BIN" scheme spans from c = 10 2 to c = 10 5 ' 8 , about two orders of magnitude higher than that of "SAT". 
In Section |Vj we will present a numerical experiment that points to a potential application of the binary 
sensor in high dynamic range photography. 

2 The estimator in ((33} is biased around c = C max . For a very narrow range of light intensity values centered around C ma x, the 
MSE of this biased estimator is lower than the ideal CRLB. Thus, there is actually a short "spike" in SNR right before the drop. 
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Remark 7: Note that K is the product of the spatial oversampling factor and the temporal oversampling 
factor. For example, the pixel pitch of the image sensor reported in ETI is 1.65/um. If the binary sensor is 
built on memory chip technology, with a pitch size of 50 nm (3J, then the maximum spatial oversampling 
factor is about 1089. To achieve K = 2 13 , 2 14 , 2 15 and 2 16 , respectively, as required in Fig. @] we then 
need to have temporal oversampling factors ranging from 8 to 60. Unlike traditional sensors which require 
multi-bit quantizers, the binary sensors only need one-bit comparators. This simplicity in hardware can 
potentially lead to faster readout speeds, making it practical to apply temporal oversampling. 



In the previous section, we studied the performance of the binary image sensor, and derived the MLE for 
a piecewise-constant light field model. Our analysis establishes the optimality of the MLE, showing that, 
with single-photon thresholding and large oversampling factors, the MLE approaches the performance of 
an ideal sensing scheme without quantization. In this section, we extends the MLE to the general linear 
field model in (Q]), with arbitrary interpolation kernels. As a main result of this work, we show that the 
log-likelihood function is always concave. This desirable property guarantees the global convergence of 
iterative numerical algorithms in solving the MLE. 

A. Image Reconstruction by MLE 

Under the linear field model introduced in Definition QJ reconstructing an image [i.e., the light field X(x)] 
is equivalent to estimating the parameters {c n } in ([TJ). As shown in ([8]), the light exposure values {s m } 
at different sensors are related to {c n } through a linear mapping, implemented as upsampling followed by 
filtering as in Fig. [2] Since it is linear, the mapping © can be written as a matrix-vector multiplication 



where s = [sq, si, . . . , sm-i] T , c = [co, c\, . . . , cn-i} t , and G is an M x N matrix representing the 
combination of upsampling (by K) and filtering (by g m ). Each element of s can then be written as 



where e m is the mth standard Euclidean basis vectorial 

Remark 8: In using the above notations, we do not distinguish between single exposure and multiple 
exposures, whose equivalence has been established by Proposition [TJ in Section ITl-Di In the case of multiple 
exposures, the essential structure of G — upsampling followed by filtering — remains the same. All we need 



IV. Optimal Image Reconstruction and Efficient Implementations 



s = G c 



(34) 




(35) 



3 Here we use zero-based indexing. Thus, eo = [1, 0, . . . , 0] T , ei = [0, 1, ... , 0] T , and so on. 
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to do is to replace s by the interlaced sequence {s m } constructed in ([Tol l, the oversampling factor K by 
K J, and the filter g m by g m in (fTTT ). 

Similar to our derivations in (|20l i and (|2T1 . the likelihood function given M binary measurements b = 
[bo, b\, . . . , 6m-i] T can be computed as 

M-l 

£ 6 (c) = ] [ F(B m = b m ; s m ) 



m=0 
M-l 



[ Pb JelGc), (36) 



m=0 

where (|36l l follows from (fT2l l and ([35l l. In our subsequent discussions, it is more convenient to work with 
the log-likelihood function, defined as 

M-l 

l b (c) = log£ 6 (c) = l °SPb m {e^ 1 Gc). (37) 

m=0 

For any given observation b, the MLE we seek is the parameter that maximizes Cb{c), or equivalently, 
lb{c). Specifically, 

def 

c M lO) = argmax l b (c) 

ce[o,s] N 

M-l 

= argmax V] logP6 m (e^Gc). (38) 

ce[o,5]« m=0 

The constraint c G [0, 5]^ means that every parameter c n should satisfy < c n < 5, for some preset 
maximum value S. 

Example 3: As discussed in Section [TTTl when the light field is piecewise-constant, different light field 
parameters {c n } can be estimated independently. In that case, the likelihood function has only one variable 
[see d22l )l and can be easily visualized. In Fig. [51 we plot £b(c) in (l22l ) and the corresponding log-likelihood 
function /5(c), under different choices of the quantization thresholds. We observe from the figures that the 
likelihood functions are not concave, but the log-likelihood functions indeed are. In what follows, we will 
show that this result is general, namely, the log-likelihood functions in the form of (1371 ) are always concave. 

Lemma 2: For any two integers i,j such that 0<i<j<ooor0<i<j = oo, the function 

k=i 

is concave on the interval x G [0, 00). 

Proof: See Appendix iGl ■ 
Theorem 2: For arbitrary binary sensor measurements b, the log-likelihood function 2&(c) defined in (l37l) 
is concave on the domain c G [0, S] . 
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(a) 




(b) 



Fig. 5. The likelihood and log-likelihood functions for piecewise-constant light fields, (a) The likelihood functions jCb(c), defined in 
J22l >. under different choices of the quantization thresholds q — 1,3, 5, respectively, (b) The corresponding log-likelihood functions. 
In computing these functions, we set the parameters in i22\ as follows: K = 12, i.e., the sensor is 12-times oversampled. The 
binary sensor measurements contain 10 "l"s, i.e., K\ = 10. 



Proof: It follows from the definition in (fl2l ) that, for any b m G {0, 1}, the function logp& m (s) is either 



9" 1 k 



or 



log 



s K e 



k\ 



(39) 



fc=0 k=q 

We can apply Lemma|2]in both cases, and show that {logpb m (s)} are concave functions for s > 0. Since the 
sum of concave functions is still concave and the composition of a concave function with a linear mapping 
(s m = eJ n Gc) is still concave, we conclude that the log-likelihood function defined in (I37T ) is concave. ■ 
In general, there is no closed-form solution to the maximization problem in (|38V An MLE solution has 
to be found through numerical algorithms. Theorem |2] guarantees the global convergence of these iterative 
numerical methods. 



B. Iterative Algorithm and Efficient Implementations 

We compute the numerical solution of the MLE by using a standard gradient ascent method. Denote by 
c( fc ) the estimation of the unknown parameter c at the kth step. The estimation c( fc+1 ) at the next step is 
obtained by 

c (*+i) = Pv (>) + 7fc V/ 6 (cW)) , (40) 

where V/f,(c( fe )) is the gradient of the log-likelihood function evaluated at c'% 7& is the step-size at the 
current iteration, and Pp is the projection onto the search domain V = [0,5]^. We apply Pp to ensure 
that all estimations of c lie in the search domain. 
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Taking the derivative of the log-likelihood function lb(c) in (1371 ). we can compute the gradient as 

T 



VZ 6 (c«) = G 



T 



fl tl ( S f ) ),D kl ( S f ) ),...,%J S S_ 1 ) 



(41) 



where = f [s^, . . . , s^_^\ T = Gc^ is the current estimation of the light exposure values, and 



D b (s) = ^-logp b (s) for b = 0,1. 
as 

For example, when q = 1, we have po(s) = e~ s and pi(s) = 1 — e~ s . In this case, Dq(s) = —1 and 
Z?i(s) = 1/(1 — e~ s ), respectively. 

The choice of the step size 7^ has significant influence over the speed of convergence of the above 
iterative algorithm. We follow ||9] by choosing, at each step, a 7^ so that the gradient vectors at the current 
and the next iterations are approximately orthogonal to each other. By assuming that the estimates s( fc+1 ) 
and s( fc ) at consecutive iterations are close to each other, we can use the following first-order approximation 



where 
It follows that 

v/ 6(c (^)) = g t [A B («r i) )A(«r i) ), . . ..A^c-fc?); 

« 7/ 6 (cW) + G^diag {H b0 (si%H bl (,«), . . . , H bM _ t (,<» (s^ - (42) 

Assuming that the gradient update c( fe ) + 7^ VZb(cW) is inside of the constraint set P, we can neglect the 
projection operator Pp in (|4P1 . and write 



a (fc+i) _ a (fc) = G ( C (* +1 ) - c«) = 7fc GvZ 6 ( c ( fe )). 



Substituting the above equality into (1421 . we get 

V/ 6 (c( fc+1 )) « V/ 6 (c«) + lk G T diag {^ (4 fc) ), ff 6l (4 fe) ), • • .,H bM _Xs { S-x)}GVl b {c^). 

Finally, by requiring that VZb(c( fc+1 )) be orthogonal to V/j,(e( fc )), we compute the optimal step size as 

||V/ 6 (c( fc ))|| 2 



lk 



diag I yJ-H^s™), ^-^^rili)} GVZ 6 (cW) 



(43) 



Remark 9: By definition, H b (s) (for 6 = 0, 1) are the second-order derivatives of concave functions (see 



Lemma [2]), and are thus nonpositive. Consequently, the terms y/—H b (s) in the denominator of (l43l are 
well-defined. 
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Fig. 6. Signal processing implementations of Ga and G T b. (a) The product Ga can be obtained by upsampling followed by 
filtering, (b) The product G T b can be obtained by filtering followed by downsampling. Note that the filter used in (b) is g- m , i.e., 
the "flipped" version of g m . (c) The polyphase domain implementation of (a), (d) The polyphase domain implementation of (b). 



At every iteration of the gradient algorithm, we need to update the gradient and the step size 7^. We 
see from (l4~TT i and (|43l l that the computations always involve matrix-vector products in the form of Ga and 
G T b, for some vectors a, b. The matrix G is of size M X N, where M is the total number of pixels. In 
practice, M will be in the range of 10 9 ~ 10 10 (i.e., gigapixel per chip), making it impossible to directly 
implement the matrix operations. Fortunately, the matrix G used in both formulae is highly structured, 
and can be implemented as upsampling followed by filtering (see our discussions in Section III-BI and the 
expression d8) for details). Similarly, the transpose G T can be implemented by filtering (by g- m ) followed 
by downsampling, essentially "flipping" all the operations in G. Fig. |6(a)| and Fig. |6(b)| summarizes these 
operations. 

We note that the implementations illustrated in Fig. |6(a) and Fig. |6(b)| are not yet optimized: For example, 
the input to the filter g m in Fig. 6(a)| is an upsampled sequence, containing mostly zero elements; In Fig. |6(b)[ 
we compute a full filtering operation (by g- m ), only to discard most of the filtering results in the subsequent 
downsampling step. All these deficiencies can be eliminated by using the tool of polyphase representations 
from multirate signal processing lTT~5l . iTTrjI . as follows. 

First, we split the filter g m into K non-overlapping polyphase components 5o,m, 9i,m, ■ ■ ■ , 9K-i,m, defined 

as 

gk,m = gKm+k, for < k < K. (44) 



Intuitively, the polyphase components specified in (1441) are simply downsampled versions of the original 
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filter g m , with the sampling locations of all these polyphase components forming a complete partition. The 
mapping between the filter g m and its polyphase components is one-to-one. To reconstruct g m , we can 
easily verify that, in the z-domain, 

G{z) = G (z K ) + z~ l G x (z K ) + ... z-lK-VGK-iiz*). (45) 

Following the same steps as above, we can also split the sequences u m and b m in Fig. [6] into their respective 
polyphase components u , m , u 1>m , u K -i, m and 6 , m , b 1>m , b K -i, m - 

Proposition 5: Denote by U k {z) and B k {z) (for < k < K) the z-transforms of u k)m and b k ^ m , 
respectively. Then, 

U k {z) = A(z)G k (z), forO<k<K, (46) 

and 

K-l 

V{z) = Bk^Gkiz' 1 ). (47) 
k=o 

Proof: See Appendix IH1 ■ 
The results of Proposition [5] require some further explanations. What equation (l46l ) suggests is an 
alternative implementation of Ga, as shown in Fig. |6(c)| We compute K parallel convolutions between 
the input a m and the polyphase filters {gk,m}- The channel outputs are the polyphase components {uk, m }, 
which can be combined to form the desired output u m . Similarly, it follows from (|47l that G T b can be 
implemented by the parallel filtering scheme in Fig. |6(d)| 

The new implementations in Fig. 6(c) and Fig. |6(d)| are significantly faster than their respective coun- 



terparts. To see this, suppose that the filter g rn has L coefficients. It is easy to see that the original 



implementation in Fig. 6(a) requires 0(KL) arithmetic operations for every pixel in a m . In contrast, each 



individual channel in Fig. 6(c) requires only O (L /K) arithmetic operations (due to the shorter supports of 
the polyphase filters), and thus the total cost of Fig. |6(c)| stays at 0{L) operations per pixel. This represents 
a iT-fold reduction in computational complexities. A similar analysis also shows that Fig. |6(d)| needs K- 
times fewer operations than Fig. |6(b)| Recall that K is the oversampling factor of our image sensor. As we 
operate in highly oversampled regimes {e.g., K = 1024) to compensate for information loss due to one-bit 
quantizations, the above improvements make our algorithms orders of magnitude faster. 

V. Numerical Results 

We present several numerical results in this section to verify our theoretical analysis and the effectiveness 
of the proposed image reconstruction algorithm. 
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Fig. 7. Binary sensing and reconstructions of 1-D light fields, (a) The original light field A(a-), modeled as a linear combination 
of shifted spline kernels, (b) The reconstruction result obtained by the proposed MLE-based algorithm, using measurements taken 
by a sensor with spatial oversampling factor K = 256. (c) An improved reconstruction result due to the use of a larger spatial 
oversampling factor K = 2048. (d) An alternative result, obtained by keeping K = 256 but taking J = 8 consecutive exposures. 



A. 1-D Synthetic Signals 



Consider a 1-D light field \{x) shown in Fig. |7(a)| The interpolation filter <p(x) we use is the cubic 



B-spline function f3s(x) defined in ((3]). We can see that X(x) is a linear combination of the shifted kernels, 
with the expansion coefficients {c n } shown as blue dots in the figure. 

We simulate a binary sensor with threshold q = 1 and oversampling factor K = 256. Applying the 
proposed MLE-based algorithm in Section JV] we obtain a reconstructed light field (the red dashed curve) 
shown in Fig. |7(b)[ together with the original "ground truth" (the blue solid curve). We observe that the 
low-light regions are well-reconstructed but there exist large "overshoots" in the high-light regions. 

We can substantially improve the reconstruction quality by increasing the oversampling factor of the 
sensor. Fig. |7(c)| shows the result obtained by increasing the spatial oversampling factor to K = 2048. 
Alternatively, we show in Fig. |7(d)| a different reconstruction result obtained by keeping the original 
spatial oversampling factor at K = 256, but taking J = 8 consecutive exposures. Visually, the two sensor 
configurations, i.e., K = 2048, J = 1 and K = 256, J = 8, lead to very similar reconstruction performances. 
This observation agrees with our previous theoretical analysis in Section III-DI on the equivalence between 
spatial and temporal oversampling schemes. 
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Fig. 8. High dynamic range photography using the binary sensor, (a) A sequence of images taken inside of a church with decreasing 
exposure times 1221 . (b) The reconstructed high dynamic range radiance map (in logarithmic scales) using our MLE reconstruction 
algorithm, (c) The tone-mapped version of the reconstructed radiance map. 



B. Acquiring Scenes with High Dynamic Ranges 

A well-known difficulty in photography is the limited dynamic ranges of the image sensors. Capturing 
both very bright and very dark regions faithfully in a single image is difficult. For example, Fig. |8(a)| shows 
several images taken inside of a church with different exposure times 11221 . The scene contains both sun-lit 
areas and shadow regions, with the former over a thousand times brighter than the latter. Such high dynamic 
ranges are well-beyond the capabilities of conventional image sensors. As a result, these images are either 
overexposed or underexposed, with no single image rendering details in both areas. In light of this problem, 
an active area of research in computational photography is to reconstruct a high dynamic range radiance 
map by combining multiple images with different exposure settings (see, e.g., E2l . ||23l ). While producing 
successful results, such multi-exposure approaches can be time-consuming. 

In Section IIII-CL we have shown that the binary sensor studied in this work can achieve higher dynamic 
ranges than conventional image sensors. To demonstrate this advantage, we use the high dynamic range 
radiance map obtained in E2l as the ground truth data [i.e., the light field A(x) as defined in CD)], and 
simulate the acquisition of this scene by using a binary sensor with a single photon threshold. The spatial 
oversampling factor of the binary sensor is set to 32 x 32, and the temporal oversampling factor is 256 {i.e., 
256 independent frames). Similar to our previous experiment on 1-D signals, we use a cubic B-spline kernel 
{i.e., ip(x) = ,$3(x)] along each of the spatial dimensions. Fig. |8(b)| shows the reconstructed radiance map 
using our algorithm described in Section [TV] Since the radiance map has a dynamic range of 3.3 x 10 5 : 1, 



the image is shown in logarithmic scale. To have a visually more pleasing result, we also shown in Fig. |8(c) 
a tone-mapped ll23l version of the reconstruction. We can see from Fig. |8(b)| and Fig. 8(c) that details in 
both light and shadow regions have been faithfully preserved in the reconstructed radiance map, suggesting 
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Fig. 9. Image reconstruction from the binary measurements taken by a SPAD sensor 1171 . with a spatial resolution of 32 x 32 
pixels. The final image (lower-right comer) is obtained by incorporating 4096 consecutive frames, 50 of which are shown in the 
figure. 

the potential application of the binary sensor in high dynamic range photography. 

C. Results on Real Sensor Data 

We have also applied our reconstruction algorithm to images taken by an experimental sensor based 
on single photon avalanche diodes (SPADs) ifTTl . The sensor has binary-valued pixels with single-photon 
sensitivities, i.e., the quantization threshold is q = 1. Due to its experimental nature, the sensor has limited 
spatial resolution, containing an array of only 32x32 detectors. To emulate the effect of spatial oversampling, 
we apply temporal oversampling and acquire 4096 independent binary frames of a static scene. In this case, 
we can estimate the light intensity at each pixel independently by using the closed-form MLE solution in 
(28). Fig. [9] shows 50 such binary images, together with the final reconstruction result (at the lower-right 
corner). The quality of reconstruction verifies our theoretical model and analysis. 

VI. Conclusions 

We presented a theoretical study of a new image sensor that acquires light information using one-bit 
pixels — a scheme reminiscent of traditional photographic film. By formulating the binary sensing scheme 
as a parameter estimation problem based on quantized Poisson statistics, we analyzed the performance of 
the binary sensor in acquiring light intensity information. Our analysis shows that, with a single-photon 
quantization threshold and large oversampling factors, the binary sensor performs much like an ideal sensor, 
as if there were no quantization. To recover the light field from binary sensor measurements, we proposed 
an MLE-based image reconstruction algorithm. We showed that the corresponding log-likelihood function 
is always concave, thus guaranteeing the global convergence of numerical solutions. To solve for the MLE, 
we adopt a standard gradient method, and derive efficient implementations using fast signal processing 
algorithms in the polyphase domain. Finally, we presented numerical results on both synthetic data and 
images taken by a prototype sensor. These results verify our theoretical analysis and demonstrate the 
effectiveness of our image reconstruction algorithm. They also point to the potential of the new binary 
sensor in high dynamic range photography applications. 
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Appendix 



A. Proof of Proposition \J\ 



The sequence s m in (fT6l) can be written, equivalently, as s m = J2j=o Xm=o s j> Sm-jn-j, where 5i is 
the Kronecker delta function. Taking z-transforms on both sides of the equality leads to 



J-i 



M-l 



J-l 



(48) 



s(z) = E s ^^" Jn = E ^(* J )- 

j=0 n=0 j=0 

By substituting (031 ) into (l48l and using the definition (fTTl) . we can simplify (|48T ) as 

S(z) = C(z KJ )G(z). (49) 
Finally, since C(z KJ ) is the z-transform of the sequence J2 n c n^m~KJn, it follows from (|49l that s m = 

(En C « Sm-KJn) * 9m, and thus (Hi. 



B. The CRLB of Binary Sensors 

We first compute the Fisher information, defined as 1(c) = E[— log£f,(c)]. Using ((221) . we get 



/(c) = E 



E 



-^(#1 log Pi (c/tf) + {K-K X ) log Po (c/K) 
K x (p>>(c/K) Pl (c/K) + p' {c/Kf) (K - K,) (p>>(c/K)p (c/K) - p' (c/K] 



(50) 



K 2 Pl (c/K) 2 K 2 p (c/K) 2 

where p' (x) = ^Po(x) and p'q(x) = -^pq(x) are the first and second order derivative of po{x), respectively. 
In reaching ( [50l , we have also used the fact that p\(x) = 1 — po(x) and thus p' (x) = — -^Pi(x) and 

Po( x ) = --&Pi( x )- 

Note that K\ = Eo<m<ft: is a binomial random variable, and thus its mean can be computed as 

E[Ki] = Kpi(c/K) = K(l - po(c/K)). 

On substituting the above expression into (|50l , the Fisher information can be simplified as 

p^c/K) Pl (c/K) + p' (c/K) 2 pZ(c/K) Po (c/K) - p' (c/K) 2 



1(c) 



K Pl (c/K) 
p' (c/K) 2 



K Po (c/K) 



(51) 



K Po (c/K) Pl (c/K) 

Using the definition of po(x) in (fT2l) . the derivative in the numerator of (1511 can be computed as 

x q-l 

Finally, since CKLB bin (K,q) = I /1(c), we reach (|23]> by substituting ([12]) and (|52]> into (ED, and after 
some straightforward manipulations. 



(52) 



YANG et ah: BITS FROM PHOTONS: OVERSAMPLED IMAGE ACQUISITION USING BINARY POISSON STATISTICS 



27 



C. The CRLB of Ideal Unquantized Sensors 

Without quantization, the sensor measurements are y = [yo, yi, . . . , yK-i] T , ie., the number of photons 
collected at each pixel. The likelihood function in this case is 

C y {c) = F(Y m = y m ,0<m<K; c), 

= n (c/g) T e/K . <*> 

where (|53l l follows from the independence of {Y m } and the expressions ([TTl ) and (fl9l ). 
Computing the Fisher information /(c) = E[— log C y {c)] in this case, we get 



/(c) = E 



pa K ~ l 

{y m log(c/K) - c/K - log(y m !)) 



dc 2 

m=0 



K-l 
m=0 



i\ — i 

[Y,ym\/c 2 . (54) 



Since {y m } are drawn from Poisson distributions as in (HD, we have E[y m ] = s m = c/K for all m. It then 
follows from (ED that /(c) = K{c/K)/c 2 = 1/c, and therefore CRLB idea i(/C) = l//(c) = c. 

D. Proof of Proposition \3\ 

Using (Hp) , (IBTI ) and (l52l ). and through a change of variables c/K ^ x, we have 

CRLB bin (K, g)/CRLB ideal (^) = ^J^^J^ - (55) 

It follows from the properties of incomplete gamma functions that po(x) = r~rji f^°t q ~ 1 e~ t dt and 
pi(x) = rprj] Jo C t g " 1 e" f (it. Using a change of variables i — )• we can further rewrite po(^) as 

PoW = T^IJ! Jo^t)* 6 "^ f ■ 11 follows that 

1 f-x ( x 2 \ Q -4- dt\ ( 1 C x +Q-l„-t 



x 2q-l e -2x/( q _ 1)12 x 2 i- l e- 2x /(q - l)! 2 



xe 2x 



X 



t- q - 1 e- s rdt / t q - x e- l dt 
Jo 



> xe 2x Qf t~ x e- « "I di^ , (56) 

where in reaching (|56l we have used the Cauchy-Schwarz inequality. 

It is easy to verify (through a change of variables t — > ^-) that J i _1 e _r « _ 2 (# = t~ e~^t~? dt. 

/ x 2 t x2 

Consequently, the term on the right-hand side of (l56l is equal to xe 2x ( ^ J/^t~ l e~ dt) . Through a 
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change of variables t — > t 2 , we have, 



xe 



1 /■oo , \ 2 / poo , 2 \ 2 

' t^e-^-Ut) = xe 2x / rh-^-Tdt 



2 



o / \Jo 



fOO 2 \ • 



-oo^/4 + « 2 /: 



v 

: du , (57) 



where (l57l is obtained through another change of variables t — ^f + y^ + x- 

We can easily verify that (|57T ) is a monotonically increasing function with respect to x > 0. So, for 
x > 1, (1571 ) is greater than 



oo 



e -3 u /^4: + u 2 du) « 1.31. (58) 



For < x < 1, we can obtain the following inequalities by keeping the first two terms in po(x) and the 
first term in p\{x): 

p (x) Pl (x) (1 + x)e~ 2x x c '/q\ (q - l)!^ 1 ""? + x 2 ^) 2(q - 1)1 



x 2q-l e -2x/( g _ ^12 - x 2q-l e -2z/( q _ 1)|2 

It is easy to see that 2 ^ q ~ 1 ^ is a monotonically increasing function with respect to q > 2. Therefore, 

2(a — IV 4 

^ > - 1.33 for q > 3. (59) 

g 3 

Finally, for q = 2, we keep the first two terms in P o(x) and pi(cc) and get 

Po (x) Pl (x) (l + x)e- 2x (x 2 /2 + x 3 /6) . , 1 

>- '—L = x /6 + 2/3 + — > 1-33 for0<x<l. (60) 



x 2q-i e -2xj^ q _ iy2 - x 3 e~ 2x 2x 

Combining (T58T ). (|59l and (l60l . we reach the desired result in (l26b . 

Finally, we show that, for q > 2, the "gap" between CRLB b i n (ET, q) and CRLBid ea i will only get bigger 
as the oversampling factor K grows. To that end, we notice that when K — > oo, the variable x = c/K — > 0. 
Keeping the first terms in po(x) and pi(x), we have 

Po(x)pi(x) ^ e~ 2x x q /ql (g— l)!x 1_9 



x 2q-l e -2x/( q _ 1)12 - x 2q-l e -2x/( q _ j)|2 

For q >2, the above quantity goes to infinity as x — )• 0. Therefore, 



lim CRLB bin (A',g)/CRLB ideal (K) = oo. 

K— »oo 
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E. Proof of Theorem \J\ 

When q = 1, we have po(x) = e~ x and thus p^ (x) = — log(x). In this case, the MLE solution in ([28 



can be rewritten as 

-mog(l - Kt/K), if < Kt < K{1 - e~ s / K ), 
S, otherwise. 



cml(&) 



We note that -K\og(l - K x /K) = K x + § + ^ + . . . and that lim^oo K(l - e" s l K ) = S. Thus, 
for sufficiently large K, the above MLE solution can be further simplified as 

cudb) = { K (61) 

5, otherwise. 

Without loss of generality, we assume that S is an integer in what follows. The expected value of the MLE 
then becomes 

E[c ML (f>)] = e nF ( Ki = n )+ s Yl ¥ ( Ki = n ) + °o-/ K )- 

n=0 n=S 

Using the following identity c = Yl™=o n C "nl " a bout the mean of a Poisson random variable, we have 

5—1 / „ „ r \ K oo 



|E[c ML (b)] - c| = \J2 n ( P(Ki = n)- ) + S E F (^i = ~ E + ° ( W 

n.=o ^ n. y ^ =s , n. 

5— 1 / n — c \ A" oo n _ c 

<^|E = ») - | + s E p ( J ^ = ») + £ t^tv + °(V*0- (62) 

In what follows, we derive bounds for the quantities on the right-hand side of the above inequality. First, 
consider the probability f(Ki = n). Since K\ is a binomial random variable, we have 

¥(K X = n)= (1 - p (c/K)r P o(c/K)( K ^ 

= K(K-l). (K-n + 1) _ _ cjK (K _ n)/K ^ 
n\K n 

For every n < S, it is easy to verify that K ^ K ~^K-n+i) = 1+(D (i/ K ^ (K(l-e~ c / K )) n = c n + 0(l/K) 
and e - c ( K ~ n )/ K — e ~ c _|_ 0(1/K). Thus, for any n < S, we can simplify d63l ) as 



F(Ki = n) = —e~ c + OQ/K). (64) 
n! 



It follows that 

s\ 



n=0 



E ( p (^i = *) - ^r) | = °(V^o- (65) 
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Next, consider the second term on the right-hand side of (l62l . 

K S-l 
S ^ W(Ki =n) = S(l-J2 P(#i = n)) 

n=Q 
5-1 n 

+ (66) 

/ — ' n! 

n=0 

oo n 



n=S n=0 

S-l 



n=5 

<5e- c (|) S + O(l/J0, (67) 



for all c < S, where (1661 ) follows from (l64l i and the inequality (1671 1 is due to the Chernoff bound on the tail 
of Poisson distributions |[24l . Similarly, the third term on the right-hand side of (l62l ) can be rewritten as 



, s-i 

ec 



where the inequality is again an application of the Chernoff bound. Finally, on substituting d65T ), (|67T ) and 
into d62l) , and after some simple manipulations, we reach d29"l ). 
The proof for the mean-squared error formula (fMB is similar. Using (l6Tb . we have 

S-l K 



E [(c ML (b) - c) 2 ] = - c ) 2p (^i = n) + (5 - c) 2 £ W = «) + C(V^) 

n=0 n=S 

= £(" - c ) 2 ^r + ( 5 " c ) 2 E + °(Vn (69) 



ra! ' n! 

n=0 n=S 



where in reaching (|69l , we have used the estimation (l64l i of the Binomial probabilities. We note that the 
variance of a Poisson random variable is equal to its mean. Thus, c = X]^o( n ~~ c) 2 c "^, - . On combining 
this identity with d69l . 



E [(?ml(6) - c) 2 ] - c = | (S - c) 2 £ - £ (n - c) 2 ^ + 0(l/tf) 

n=S n=S 

<2V n 2 ^ + 0(l/X), forc<5 
z — ' n! 

n=S 

<2c(c + l) V + forc<5-2. 

z — ^ n! 

n=5-2 

Applying the Chernoff bound to the above inequality, we get (l30l) . 
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F. Proof of Proposition |?] 



We have Po (x) = e~ x YXX £p and thus 1 " Po(x) = e~ x £~ ff- 11 follows that 



K(l- Po (S/K))=Ke- s / K (J^ + 



-q n\ 
g q +l 



-S/K 



S'-i 



S q+1 



+ 



+ 



Ki~ l q\ Ki(q + 1)\ 

For q > 2 and any fixed constant S, the above quantity converges to as A tends to infinity. As a result, 
for sufficiently large K, the MLE solution in d28l) can be simplified as 



cml(&) 



0, if K\ = 0, 
S, otherwise, 



(70) 



where we have also used the fact that po(0) = 1 and thus p (1) = 0. Using (1701) . we can compute the 
expected value of the MLE as 



E[cml(&)] = OViKt = 0) + 5(1 - P(Ki = 0)) = S(l - P(#i = 0)). 



(71) 



We have if i = when all the pixel responses are uniformly 0. The probability of seeing such an event is 

\K 



P(Ki = 0) = p (c/K) K = e" c 1 + - + ... + 



A" 



K<i-\q- 1)! 



which converges to 1 as A tends to infinity, i.e., 



lim P(Ai = 0) = 1. 



(72) 



Substituting d72]) into (TTTb . we get (|3T1) . 
Next, we compute the MSE as 

E [(c ML (b) - c) 2 ] = 
which, upon taking the limit as A — > oo, leads to (|32~1) . 



c 2 P(Ax = 0) + (S - c) 2 (1 - P(A X = 0)) , 



G. Proof of Lemma |2] 

The function h(x) == log Ylk=i ^~W~ ^ s continuously differentiable on the interval (0,oo). Therefore, to 
establish its concavity, we just need to show that its second derivative is nonpositive. To that end, we first 
introduce a sequence of functions {^fc(^)}fc 6 2u{oo}' defined as 



r k (x) 



del' 



x k /k\, if < k < oo; 



(73) 



0. 



if k < or k = oo. 
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It is straightforward to verify that -azT^x) = r k _i{x) for all k G Z U {00} . Now, rewriting as 
log Ylk=i r k{ x ) — x and computing its second derivative, we get 

d 2 ur„\ _ (Efc=i T, fc-2)(Efc=i r fc) ~ (EL^fc-i) 2 r74 , 
dx (Ek=t r kr 
where we have omitted the function argument x in r k (x), r k -i(x) and rk-2{x) for notational simplicity. 

Recall that our goal is to show that jj^ihix) < 0, for x > 0. Since the denominator of (l74l ) is always 

positive, we just need to focus on its numerator. Using the identities Yli<k<j r k = Yl<i<k<j r k-\ + r j — r i 

an d Yl,i<k<j r k~i = Yli<k<j r k-2 + fj—i — Ti-i, we can simplify the numerator of (l74T i as follows: 

rfc " 2 ) ( Yl rk ~ 1 + r J ~ ri ) ~ ( X! r *-i) ( Y rk ~ 2 + r i-i ~ ri " x 

i<k<j i<k<j i<k<j i<k<j 

= ^ [iTk-^Tj - r k -irj-i) + (r fe _iri_i - r fc _ 2 ri)). (75) 

i<fc<7 

In what follows, we show that 

r fe _ 2 (a;)r i (a;) - r fc _i(x)rj_i(a;) < (76) 

for arbitrary choices of x > and i < k < j, where 0<i<j<ooor0<i<j = oo. Note that, when 
k < 2 or j = 00, the left-hand side of d76l) becomes — rk-i(x)rj-i(x) and thus d76l) automatically holds. 
Now, assume that k > 2 and j < 00. From the definition in (l73l ). the left-hand side of (1761 ) is 

^ x^ ^ x kJr ^ 2 Z' 1 1 



(fc- 2)!i! (Ar-l)!(i-l)! (fc - 2)!(j - 1)! V j k — 1 
fori<k< j. Using similar arguments, we can also show that 

rk-i(x)n~i(x) - r k -2(x)r.i(x) < 0, for x > 0. (77) 

On substituting the inequalities d76l ) and (1771 ) into d75T ). we verify that the numerator of (T74T i is nonpositive, 
and therefore -^jh^x) < 0, for all x > 0. 

//. Proof of Proposition \5\ 

Expressing the signal processing operations in Fig. |6] in the z-domain, we have 

U(z) = A{z K )G{z) 

= A(z K )G (z K ) + z- 1 A(z K )G l (z K ) + ... + z^ K - 1 ^A(z K )G K . l (z K ), (78) 

where A(z K ) in the first equality is the z-transform of the K-times upsampled version of a m , and (|78T ) 
follows from (l45T ). Similar to (l45l . we can expand C7 (z) in terms of the z-transforms of its polyphase 
components, as 

U(z) = U (z K ) + z~ x XJ x {z K ) + ... + z^ K ^U K ^(z K ). (79) 
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Comparing (1781 1 against (1791 ) and using the uniqueness of the polyphase decomposition, we conclude that 
U k {z) = A{z)G k (z), for all < k < K. 

Now, consider Fig. |6(b) We note that the z-transform of g_ m is G(z~ 1 ). Denote by d m the output of 



the filtering operation. Then, its z-transform can be computed as 

D(z) = B{z)G{z~ l ) 

\k=Q ) \fc=0 / 

K-l 

= Y J B k (z K )G k {z~ K )+ ^ t B i (z K )G,(z K ). (80) 

k=0 0<i^j<K 

When downsampling d m by K, only the first term on the right-hand side of (l80l will be retained; the 
second term is "invisible" to the sampling operation due to mismatched supports. Therefore, we have, after 
downsampling, V(z) = Y, k =o B h{z)G k {z' x ). 
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